16 research outputs found

    Artificial intelligence in orthopaedics:false hope or not? A narrative review along the line of Gartner's hype cycle

    Get PDF
    Artificial Intelligence (AI) in general, and Machine Learn-ing (ML)-based applications in particular, have the potential to change the scope of healthcare, including orthopaedic surgery.The greatest benefit of ML is in its ability to learn from real world clinical use and experience, and thereby its capability to improve its own performance.Many successful applications are known in orthopaedics, but have yet to be adopted and evaluated for accuracy and efficacy in patients' care and doctors' workflows.The recent hype around AI triggered hope for development of better risk stratification tools to personalize orthopaedics in all subsequent steps of care, from diagnosis to treatment.Computer vision applications for fracture recognition show promising results to support decision-making, overcome bias, process high-volume workloads without fatigue, and hold the promise of even outperforming doctors in certain tasks.In the near future, AI-derived applications are very likely to assist orthopaedic surgeons rather than replace us. 'If the computer takes over the simple stuff, doctors will have more time again to practice the art of medicine'.(76)</p

    Prediction of Postoperative Delirium in Geriatric Hip Fracture Patients:A Clinical Prediction Model Using Machine Learning Algorithms

    Get PDF
    INTRODUCTION: Postoperative delirium in geriatric hip fracture patients adversely affects clinical and functional outcomes and increases costs. A preoperative prediction tool to identify high-risk patients may facilitate optimal use of preventive interventions. The purpose of this study was to develop a clinical prediction model using machine learning algorithms for preoperative prediction of postoperative delirium in geriatric hip fracture patients. MATERIALS & METHODS: Geriatric patients undergoing operative hip fracture fixation were queried in the American College of Surgeons National Surgical Quality Improvement Program database (ACS NSQIP) from 2016 through 2019. A total of 28 207 patients were included, of which 8030 (28.5%) developed a postoperative delirium. First, the dataset was randomly split 80:20 into a training and testing subset. Then, a random forest (RF) algorithm was used to identify the variables predictive for a postoperative delirium. The machine learning-model was developed on the training set and the performance was assessed in the testing set. Performance was assessed by discrimination (c-statistic), calibration (slope and intercept), overall performance (Brier-score), and decision curve analysis. RESULTS: The included variables identified using RF algorithms were (1) age, (2) ASA class, (3) functional status, (4) preoperative dementia, (5) preoperative delirium, and (6) preoperative need for mobility-aid. The clinical prediction model reached good discrimination (c-statistic = .79), almost perfect calibration (intercept = −.01, slope = 1.02), and excellent overall model performance (Brier score = .15). The clinical prediction model was deployed as an open-access web-application: https://sorg-apps.shinyapps.io/hipfxdelirium/. DISCUSSION & CONCLUSIONS: We developed a clinical prediction model that shows promise in estimating the risk of postoperative delirium in geriatric hip fracture patients. The clinical prediction model can play a beneficial role in decision-making for preventative measures for patients at risk of developing a delirium. If found to be externally valid, clinicians might use the available web-based application to help incorporate the model into clinical practice to aid decision-making and optimize preoperative prevention efforts

    Augmented and virtual reality in spine surgery, current applications and future potentials

    Get PDF
    BACKGROUND CONTEXT: The field of artificial intelligence (AI) is rapidly advancing, especially with recent improvements in deep learning (DL) techniques. Augmented (AR) and virtual reality (VR) are finding their place in healthcare, and spine surgery is no exception. The unique capabilities and advantages of AR and VR devices include their low cost, flexible integration with other technologies, user-friendly features and their application in navigation systems, which makes them beneficial across different aspects of spine surgery. Despite the use of AR for pedicle screw placement, targeted cervical foraminotomy, bone biopsy, osteotomy planning, and percutaneous intervention, the current applications of AR and VR in spine surgery remain limited. PURPOSE: The primary goal of this study was to provide the spine surgeons and clinical researchers with the general information about the current applications, future potentials, and accessibility of AR and VR systems in spine surgery. STUDY DESIGN/SETTING: We reviewed titles of more than 250 journal papers from google scholar and PubMed with search words: augmented reality, virtual reality, spine surgery, and orthopaedic, out of which 89 related papers were selected for abstract review. Finally, full text of 67 papers were analyzed and reviewed. METHODS: The papers were divided into four groups: technological papers, applications in surgery, applications in spine education and training, and general application in orthopaedic. A team of two reviewers performed paper reviews and a thorough web search to ensure the most updated state of the art in each of four group is captured in the review. RESULTS: In this review we discuss the current state of the art in AR and VR hardware, their preoperative applications and surgical applications in spine surgery. Finally, we discuss the future potentials of AR and VR and their integration with AI, robotic surgery, gaming, and wearables. CONCLUSIONS: AR and VR are promising technologies that will soon become part of standard of care in spine surgery. (C) 2021 Published by Elsevier Inc

    Does the SORG Orthopaedic Research Group Hip Fracture Delirium Algorithm Perform Well on an Independent Intercontinental Cohort of Patients With Hip Fractures Who Are 60 Years or Older?

    Get PDF
    Background Postoperative delirium in patients aged 60 years or older with hip fractures adversely affects clinical and functional outcomes. The economic cost of delirium is estimated to be as high as USD 25,000 per patient, with a total budgetary impact between USD 6.6 to USD 82.4 billion annually in the United States alone. Forty percent of delirium episodes are preventable, and accurate risk stratification can decrease the incidence and improve clinical outcomes in patients. A previously developed clinical prediction model (the SORG Orthopaedic Research Group hip fracture delirium machine-learning algorithm) is highly accurate on internal validation (in 28,207 patients with hip fractures aged 60 years or older in a US cohort) in identifying at-risk patients, and it can facilitate the best use of preventive interventions; however, it has not been tested in an independent population. For an algorithm to be useful in real life, it must be valid externally, meaning that it must perform well in a patient cohort different from the cohort used to "train" it. With many promising machine-learning prediction models and many promising delirium models, only few have also been externally validated, and even fewer are international validation studies. Question/purpose Does the SORG hip fracture delirium algorithm, initially trained on a database from the United States, perform well on external validation in patients aged 60 years or older in Australia and New Zealand? Methods We previously developed a model in 2021 for assessing risk of delirium in hip fracture patients using records of 28,207 patients obtained from the American College of Surgeons National Surgical Quality Improvement Program. Variables included in the original model included age, American Society of Anesthesiologists (ASA) class, functional status (independent or partially or totally dependent for any activities of daily living), preoperative dementia, preoperative delirium, and preoperative need for a mobility aid. To assess whether this model could be applied elsewhere, we used records from an international hip fracture registry. Between June 2017 and December 2018, 6672 patients older than 60 years of age in Australia and New Zealand were treated surgically for a femoral neck, intertrochanteric hip, or subtrochanteric hip fracture and entered into the Australian & New Zealand Hip Fracture Registry. Patients were excluded if they had a pathological hip fracture or septic shock. Of all patients, 6% (402 of 6672) did not meet the inclusion criteria, leaving 94% (6270 of 6672) of patients available for inclusion in this retrospective analysis. Seventy-one percent (4249 of 5986) of patients were aged 80 years or older, after accounting for 5% (284 of 6270) of missing values; 68% (4292 of 6266) were female, after accounting for 0.06% (4 of 6270) of missing values, and 83% (4690 of 5661) of patients were classified as ASA III/IV, after accounting for 10% (609 of 6270) of missing values. Missing data were imputed using the missForest methodology. In total, 39% (2467 of 6270) of patients developed postoperative delirium. The performance of the SORG hip fracture delirium algorithm on the validation cohort was assessed by discrimination, calibration, Brier score, and a decision curve analysis. Discrimination, known as the area under the receiver operating characteristic curves (c-statistic), measures the model's ability to distinguish patients who achieved the outcomes from those who did not and ranges from 0.5 to 1.0, with 1.0 indicating the highest discrimination score and 0.50 the lowest. Calibration plots the predicted versus the observed probabilities, a perfect plot has an intercept of 0 and a slope of 1. The Brier score calculates a composite of discrimination and calibration, with 0 indicating perfect prediction and 1 the poorest. Results The SORG hip fracture algorithm, when applied to an external patient cohort, distinguished between patients at low risk and patients at moderate to high risk of developing postoperative delirium. The SORG hip fracture algorithm performed with a c-statistic of 0.74 (95% confidence interval 0.73 to 0.76). The calibration plot showed high accuracy in the lower predicted probabilities (intercept -0.28, slope 0.52) and a Brier score of 0.22 (the null model Brier score was 0.24). The decision curve analysis showed that the model can be beneficial compared with no model or compared with characterizing all patients as at risk for developing delirium. Conclusion Algorithms developed with machine learning are a potential tool for refining treatment of at-risk patients. If high-risk patients can be reliably identified, resources can be appropriately directed toward their care. Although the current iteration of SORG should not be relied on for patient care, it suggests potential utility in assessing risk. Further assessment in different populations, made easier by international collaborations and standardization of registries, would be useful in the development of universally valid prediction models. The model can be freely accessed at: https://sorg-apps.shinyapps.io/hipfxdelirium/

    Do symptoms of anxiety and/or depression and pain intensity before primary Total knee arthroplasty influence reason for revision? Results of an observational study from the Dutch arthroplasty register in 56,233 patients

    Get PDF
    Objective: Anxiety, depression and greater pain intensity before total knee arthroplasty (TKA) may increase the probability of revision surgery for remaining symptoms even without clear pathology or technical issues. We aimed to assess whether preoperative anxiety/depression and pain intensity are associated with revision TKA for less clear indications. Methods: Less clear indications for revision were defined after a Delphi process in which consensus was reached among 59 orthopaedic knee experts. We performed a cox regression analyses on primary TKA patients registered in the Dutch Arthroplasty Registry (LROI) who completed the EuroQol 5D 3 L (EQ5D-3 L) anxiety/depression score to examine associations between preoperative anxiety/depression and pain (Numeric Rating Scale (NRS)) with TKA revision for less clear reasons. These analyses were adjusted for age, BMI, sex, smoking, ASA score, EQ5D-3 L thermometer and OKS score. Results: In total, 25.9% patients of the 56,233 included patients reported moderate or severe symptoms of anxiety/depression on the EQ5D-3 L anxiety/depression score. Of those, 615 revisions (45.5%) were performed for less clear reasons for revision (patellar pain, malalignment, instability, progression of osteoarthritis or arthrofibrosis). Not EQ5D-3 L anxiety/depression score, but higher NRS pain at rest and EQ5D-3 L pain score were associated with revision for less clear reason (HR: 1.058, 95% CI 1.019-1.099 & HR: 1.241, 95% CI 1.044-1.476, respectively). Conclusion: Our findings suggest that pain intensity is a risk factor for TKA revision for a less clear reason. The finding that preoperative pain intensity was associated with reason for revision confirms a likely influence of subjective, personal factors on offer and acceptance of TKA revision. The association between anxiety/depression and reason for revision after TKA may also be found when including more specific outcome measures to assess anxiety/depression and we therefore hope to encourage further research on this topic with our study, ideally in a prospective setting. Study design: Longitudinal Cohort Study Level III, Delphi Consensu

    Can We Geographically Validate a Natural Language Processing Algorithm for Automated Detection of Incidental Durotomy Across Three Independent Cohorts From Two Continents?

    Get PDF
    Background Incidental durotomy is an intraoperative complication in spine surgery that can lead to postoperative complications, increased length of stay, and higher healthcare costs. Natural language processing (NLP) is an artificial intelligence method that assists in understanding free-text notes that may be useful in the automated surveillance of adverse events in orthopaedic surgery. A previously developed NLP algorithm is highly accurate in the detection of incidental durotomy on internal validation and external validation in an independent cohort from the same country. External validation in a cohort with linguistic differences is required to assess the transportability of the developed algorithm, referred to geographical validation. Ideally, the performance of a prediction model, the NLP algorithm, is constant across geographic regions to ensure reproducibility and model validity. Question/purpose Can we geographically validate an NLP algorithm for the automated detection of incidental durotomy across three independent cohorts from two continents? Methods Patients 18 years or older undergoing a primary procedure of (thoraco)lumbar spine surgery were included. In Massachusetts, between January 2000 and June 2018, 1000 patients were included from two academic and three community medical centers. In Maryland, between July 2016 and November 2018, 1279 patients were included from one academic center, and in Australia, between January 2010 and December 2019, 944 patients were included from one academic center. The authors retrospectively studied the free-text operative notes of included patients for the primary outcome that was defined as intraoperative durotomy. Incidental durotomy occurred in 9% (93 of 1000), 8% (108 of 1279), and 6% (58 of 944) of the patients, respectively, in the Massachusetts, Maryland, and Australia cohorts. No missing reports were observed. Three datasets (Massachusetts, Australian, and combined Massachusetts and Australian) were divided into training and holdout test sets in an 80:20 ratio. An extreme gradient boosting (an efficient and flexible tree-based algorithm) NLP algorithm was individually trained on each training set, and the performance of the three NLP algorithms (respectively American, Australian, and combined) was assessed by discrimination via area under the receiver operating characteristic curves (AUC-ROC; this measures the model's ability to distinguish patients who obtained the outcomes from those who did not), calibration metrics (which plot the predicted and the observed probabilities) and Brier score (a composite of discrimination and calibration). In addition, the sensitivity (true positives, recall), specificity (true negatives), positive predictive value (also known as precision), negative predictive value, Fl-score (composite of precision and recall), positive likelihood ratio, and negative likelihood ratio were calculated. Results The combined NLP algorithm (the combined Massachusetts and Australian data) achieved excellent performance on independent testing data from Australia (AUC-ROC 0.97 [95% confidence interval 0.87 to 0.99]), Massachusetts (AUC-ROC 0.99 [95% CI 0.80 to 0.99]) and Maryland (AUC-ROC 0.95 [95% CI 0.93 to 0.97]). The NLP developed based on the Massachusetts cohort had excellent performance in the Maryland cohort (AUC-ROC 0.97 [95% CI 0.95 to 0.99]) but worse performance in the Australian cohort (AUC-ROC 0.74 [95% CI 0.70 to 0.77]). Conclusion We demonstrated the clinical utility and reproducibility of an NLP algorithm with combined datasets retaining excellent performance in individual countries relative to algorithms developed in the same country alone for detection of incidental durotomy. Further multi-institutional, international collaborations can facilitate the creation of universal NLP algorithms that improve the quality and safety of orthopaedic surgery globally. The combined NLP algorithm has been incorporated into a freely accessible web application that can be found at https://sorg-apps.shinyapps.io/nlp_incidental_durotomy/. Clinicians and researchers can use the tool to help incorporate the model in evaluating spine registries or quality and safety departments to automate detection of incidental durotomy and optimize prevention efforts

    Artificial intelligence in orthopaedics: false hope or not? A narrative review along the line of Gartner's hype cycle

    No full text
    Artificial Intelligence (AI) in general, and Machine Learn-ing (ML)-based applications in particular, have the potential to change the scope of healthcare, including orthopaedic surgery. The greatest benefit of ML is in its ability to learn from real world clinical use and experience, and thereby its capability to improve its own performance. Many successful applications are known in orthopaedics, but have yet to be adopted and evaluated for accuracy and efficacy in patients' care and doctors' workflows. The recent hype around AI triggered hope for development of better risk stratification tools to personalize orthopaedics in all subsequent steps of care, from diagnosis to treatment. Computer vision applications for fracture recognition show promising results to support decision-making, overcome bias, process high-volume workloads without fatigue, and hold the promise of even outperforming doctors in certain tasks. In the near future, AI-derived applications are very likely to assist orthopaedic surgeons rather than replace us. 'If the computer takes over the simple stuff, doctors will have more time again to practice the art of medicine'.(76

    Do injured adolescent athletes and their parents agree on the athletes’ level of psychologic and physical functioning?

    No full text
    Background Although a parent’s perception of his or her child’s physical and emotional functioning may influence the course of the child’s medical care, including access to care and decisions regarding treatment options, no studies have investigated whether the perceptions of a parent are concordant with that of an adolescent diagnosed with a sports-related orthopaedic injury. Identifying and understanding the potential discordance in coping and emotional distress within the athlete adolescent-parent dyads are important, because this discordance may have negative effects on adolescents’ well-being. Questions/purposes The purposes of this study were (1) to compare adolescent and parent proxy ratings of psychologic symptoms (depression and anxiety), coping skills (catastrophic thinking about pain and pain self-efficacy), and upper extremity physical function and mobility in a population of adolescent-parent dyads in which the adolescent had a sport-related injury; and (2) to compare scores of adolescents and parent proxies with normative scores when such are available. Methods We enrolled 54 dyads (eg, pairs) of adolescent patients (mean age 16 years; SD = 1.6) presenting to a sports medicine practice with sports-related injuries as well as their accompanying parent(s). We used Patient-reported Outcomes Measurement Information System questionnaires to measure adolescents’ depression, anxiety, upper extremity physical function, and mobility. We used the Pain Catastrophizing Scale short form to assess adolescents’ catastrophic thinking about pain and the Pain Self-efficacy Scale short form to measure adolescents’ pain self-efficacy. The accompanying parent, 69% mothers (37 of 54) and 31% fathers (17 of 54), completed parent proxy versions of each questionnaire. Results Parents reported that their children had worse scores (47 6 9) on depression than what the children themselves reported (43 6 9; mean difference 4.0; 95% confidence interval [CI], -7.0 to 0.91; p = 0.011; medium effect size -0.47). Also, parents reported that their children engaged in catastrophic thinking about pain to a lesser degree (8 6 5) than what the children themselves reported (13 6 4; mean difference 4.5; 95% CI, 2.7-6.4; p < 0.001; large effect size 1.2). Because scores on depression and catastrophic thinking were comparable to the general population, and minimal clinically important difference scores are not available for these measures, it is unclear whether the relatively small observed differences between parents’ and adolescents’ ratings are clinically meaningful. Parents and children were concordant on their reports of the child’s upper extremity physical function (patient perception 47 6 10, parent proxy 47 6 8, mean difference -0.43, p = 0.70), mobility (patient perception 43 6 9, parent proxy 44 6 9, mean difference -0.59, p = 0.64), anxiety (patient perception 43 6 10, parent proxy 46 6 8, mean difference -2.1, p = 0.21), and pain self-efficacy (patient perception 16 6 5, parent proxy 15 6 5, mean difference 0.70, p = 0.35). Conclusions Parents rated their children as more depressed and engaging in less catastrophic thinking about pain than the adolescents rated themselves. Although these differences are statistically significant, they are of a small magnitude making it unclear as to how clinically important they are in practice. We recommend that providers keep in mind that parents may overestimate depressive symptoms and underestimate the catastrophic thinking about pain in their children, probe for these potential differences, and consider how they might impact medical care

    Development of a postoperative delirium risk scoring tool using data from the Australian and New Zealand Hip Fracture Registry: an analysis of 6672 patients 2017-2018

    No full text
    Background and purpose: This study aimed to determine the incidence, predictors of postoperative delirium and develop a post-surgery delirium risk scoring tool. Patients and Methods: A total of 6672 hip fracture patients with documented assessment for delirium were analyzed from the Australia and New Zealand Hip Fracture Registry between June 2017 and December 2018.Thirty-six variables for the prediction of delirium using univariate and multivariate logistic regression were assessed. The models were assessed for diagnostic accuracy using C-statistic and calibration using Hosmer-Lemeshow goodness-of-fit test. A Delirium Risk Score was developed based on the regression coefficients. Results: Delirium developed in 2599/6672 (39.0%) hip fracture patients. Seven independent predictors of delirium were identified; age above 80 years (OR=1.6 CI 1.4-1.9; p=0.001), male (OR=1.3 CI 1.1-1.5; p=0.007), absent pre-operative cognitive assessment (OR=1.5 CI 1.3-1.9; p=0.001), impaired pre-operative cognitive state (OR=1.7 CI 1.3 -2.1; p=0.001), surgery delay (OR=1.7 CI 1.2-2.5; p=0.002) and mobilisation day 1 post-surgery (OR=1.9 CI 1.4-2.6; p=0.001). The C-statistics for the training and validation datasets were 0.74 and 0.75, respectively. Calibration was good (χ2=35.72 (9); p<0.001). The Delirium Risk Score for patients ranged from 0 to 42 in the validation data and when used alone as a risk predictor, had similar levels of diagnostic accuracy (C-statistic=0.742) indicating its potential for use as a stand-alone risk scoring tool. Conclusion: We have designed and validated a delirium risk score for predicting delirium following surgery for a hip fracture using seven predicting factors. This could assist clinicians in identifying high risk patients requiring higher levels of observation and post-surgical care

    Feasibility of Machine Learning and Logistic Regression Algorithms to Predict Outcome in Orthopaedic Trauma Surgery

    Get PDF
    Background: Statistical models using machine learning (ML) have the potential for more accurate estimates of the probability of binary events than logistic regression. The present study used existing data sets from large musculoskeletal trauma trials to address the following study questions: (1) Do ML models produce better probability estimates than logistic regression models? (2) Are ML models influenced by different variables than logistic regression models? Methods: We created ML and logistic regression models that estimated the probability of a specific fracture (posterior malleolar involvement in distal spiral tibial shaft and ankle fractures, scaphoid fracture, and distal radial fracture) or adverse event (subsequent surgery [after distal biceps repair or tibial shaft fracture], surgical site infection, and postoperative delirium) using 9 data sets from published musculoskeletal trauma studies. Each data set was split into training (80%) and test (20%) subsets. Fivefold cross-validation of the training set was used to develop the ML models. The best-performing model was then assessed in the independent testing data. Performance was assessed by (1) discrimination (c-statistic), (2) calibration (slope and intercept), and (3) overall performance (Brier score). Results: The mean c-statistic was 0.01 higher for the logistic regression models compared with the best ML models for each data set (range, -0.01 to 0.06). There were fewer variables strongly associated with variation in the ML models, and many were dissimilar from those in the logistic regression models. Conclusions: The observation that ML models produce probability estimates comparable with logistic regression models for binary events in musculoskeletal trauma suggests that their benefit may be limited in this context
    corecore